53 research outputs found
Efficient HTTP based I/O on very large datasets for high performance computing with the libdavix library
Remote data access for data analysis in high performance computing is
commonly done with specialized data access protocols and storage systems. These
protocols are highly optimized for high throughput on very large datasets,
multi-streams, high availability, low latency and efficient parallel I/O. The
purpose of this paper is to describe how we have adapted a generic protocol,
the Hyper Text Transport Protocol (HTTP) to make it a competitive alternative
for high performance I/O and data analysis applications in a global computing
grid: the Worldwide LHC Computing Grid. In this work, we first analyze the
design differences between the HTTP protocol and the most common high
performance I/O protocols, pointing out the main performance weaknesses of
HTTP. Then, we describe in detail how we solved these issues. Our solutions
have been implemented in a toolkit called davix, available through several
recent Linux distributions. Finally, we describe the results of our benchmarks
where we compare the performance of davix against a HPC specific protocol for a
data analysis use case.Comment: Presented at: Very large Data Bases (VLDB) 2014, Hangzho
Plotting the Differences Between Data and Expectation
This article proposes a way to improve the presentation of histograms where
data are compared to expectation. Sometimes, it is difficult to judge by eye
whether the difference between the bin content and the theoretical expectation
(provided by either a fitting function or another histogram) is just due to
statistical fluctuations. More importantly, there could be statistically
significant deviations which are completely invisible in the plot. We propose
to add a small inset at the bottom of the plot, in which the statistical
significance of the deviation observed in each bin is shown. Even though the
numerical routines which we developed have only illustration purposes, it comes
out that they are based on formulae which could be used to perform statistical
inference in a proper way. An implementation of our computation is available at
https://github.com/dcasadei/psde .Comment: 10 pages, 7 figures. CODE: https://github.com/dcasadei/psd
Sqrt{shat}_{min} resurrected
We discuss the use of the variable sqrt{shat}_{min}, which has been proposed
in order to measure the hard scale of a multi parton final state event using
inclusive quantities only, on a SUSY data sample for a 14 TeV LHC. In its
original version, where this variable was proposed on calorimeter level, the
direct correlation to the hard scattering scale does not survive when effects
from soft physics are taken into account. We here show that when using
reconstructed objects instead of calorimeter energy and momenta as input, we
manage to actually recover this correlation for the parameter point considered
here. We furthermore discuss the effect of including W + jets and t tbar+jets
background in our analysis and the use of sqrt{shat}_{min} for the suppression
of SM induced background in new physics searches.Comment: 23 pages, 9 figures; v2: 1 figure, several subsections and references
as well as new author affiliation added. Corresponds to published versio
Type Ia supernova parameter estimation: a comparison of two approaches using current datasets
By using the Sloan Digital Sky Survey (SDSS) first year type Ia supernova (SN
Ia) compilation, we compare two different approaches (traditional \chi^2 and
complete likelihood) to determine parameter constraints when the magnitude
dispersion is to be estimated as well. We consider cosmological constant + Cold
Dark Matter (\Lambda CDM) and spatially flat, constant w Dark Energy + Cold
Dark Matter (FwCDM) cosmological models and show that, for current data, there
is a small difference in the best fit values and 30% difference in
confidence contour areas in case the MLCS2k2 light-curve fitter is adopted. For
the SALT2 light-curve fitter the differences are less significant (
13% difference in areas). In both cases the likelihood approach gives more
restrictive constraints. We argue for the importance of using the complete
likelihood instead of the \chi^2 approach when dealing with parameters in the
expression for the variance.Comment: 16 pages, 5 figures. More complete analysis by including peculiar
velocities and correlations among SALT2 parameters. Use of 2D contours
instead of 1D intervals for comparison. There can be now a significant
difference between the approaches, around 30% in contour area for MLCS2k2 and
up to 13% for SALT2. Generic streamlining of text and suppression of section
on model selectio
Analysis of high-identity segmental duplications in the grapevine genome
<p>Abstract</p> <p>Background</p> <p>Segmental duplications (SDs) are blocks of genomic sequence of 1-200 kb that map to different loci in a genome and share a sequence identity > 90%. SDs show at the sequence level the same characteristics as other regions of the human genome: they contain both high-copy repeats and gene sequences. SDs play an important role in genome plasticity by creating new genes and modeling genome structure. Although data is plentiful for mammals, not much was known about the representation of SDs in plant genomes. In this regard, we performed a genome-wide analysis of high-identity SDs on the sequenced grapevine (<it>Vitis vinifera</it>) genome (PN40024).</p> <p>Results</p> <p>We demonstrate that recent SDs (> 94% identity and >= 10 kb in size) are a relevant component of the grapevine genome (85 Mb, 17% of the genome sequence). We detected mitochondrial and plastid DNA and genes (10% of gene annotation) in segmentally duplicated regions of the nuclear genome. In particular, the nine highest copy number genes have a copy in either or both organelle genomes. Further we showed that several duplicated genes take part in the biosynthesis of compounds involved in plant response to environmental stress.</p> <p>Conclusions</p> <p>These data show the great influence of SDs and organelle DNA transfers in modeling the <it>Vitis vinifera </it>nuclear DNA structure as well as the impact of SDs in contributing to the adaptive capacity of grapevine and the nutritional content of grape products through genome variation. This study represents a step forward in the full characterization of duplicated genes important for grapevine cultural needs and human health.</p
gSeaGen: The KM3NeT GENIE-based code for neutrino telescopes
Program summary
Program Title: gSeaGen
CPC Library link to program files: http://dx.doi.org/10.17632/ymgxvy2br4.1
Licensing provisions: GPLv3
Programming language: C++
External routines/libraries: GENIE [1] and its external dependencies. Linkable to MUSIC [2] and PROPOSAL
[3].
Nature of problem: Development of a code to generate detectable events in neutrino telescopes, using
modern and maintained neutrino interaction simulation libraries which include the state-of-the-art
physics models. The default application is the simulation of neutrino interactions within KM3NeT [4].
Solution method: Neutrino interactions are simulated using GENIE, a modern framework for Monte
Carlo event generators. The GENIE framework, used by nearly all modern neutrino experiments, is
considered as a reference code within the neutrino community.
Additional comments including restrictions and unusual features: The code was tested with GENIE version
2.12.10 and it is linkable with release series 3. Presently valid up to 5 TeV. This limitation is not intrinsic
to the code but due to the present GENIE valid energy range.
References:
[1] C. Andreopoulos at al., Nucl. Instrum. Meth. A614 (2010) 87.
[2] P. Antonioli et al., Astropart. Phys. 7 (1997) 357.
[3] J. H. Koehne et al., Comput. Phys. Commun. 184 (2013) 2070.
[4] S. Adrián-Martínez et al., J. Phys. G: Nucl. Part. Phys. 43 (2016) 084001.The gSeaGen code is a GENIE-based application developed to efficiently generate high statistics samples
of events, induced by neutrino interactions, detectable in a neutrino telescope. The gSeaGen code is able
to generate events induced by all neutrino flavours, considering topological differences between tracktype
and shower-like events. Neutrino interactions are simulated taking into account the density and
the composition of the media surrounding the detector. The main features of gSeaGen are presented
together with some examples of its application within the KM3NeT project.French National Research Agency (ANR)
ANR-15-CE31-0020Centre National de la Recherche Scientifique (CNRS)European Union (EU)Institut Universitaire de France (IUF), FranceIdEx program, FranceUnivEarthS Labex program at Sorbonne Paris Cite
ANR-10-LABX-0023
ANR-11-IDEX-000502Paris Ile-de-France Region, FranceShota Rustaveli National Science Foundation of Georgia (SRNSFG), Georgia
FR-18-1268German Research Foundation (DFG)Greek Ministry of Development-GSRTIstituto Nazionale di Fisica Nucleare (INFN)Ministry of Education, Universities and Research (MIUR)PRIN 2017 program Italy
NAT-NET 2017W4HA7SMinistry of Higher Education, Scientific Research and Professional Training, MoroccoNetherlands Organization for Scientific Research (NWO)
Netherlands GovernmentNational Science Centre, Poland
2015/18/E/ST2/00758National Authority for Scientific Research (ANCS), RomaniaMinisterio de Ciencia, Innovacion, Investigacion y Universidades (MCIU): Programa Estatal de Generacion de Conocimiento, Spain (MCIU/FEDER)
PGC2018-096663-B-C41
PGC2018-096663-A-C42
PGC2018-096663-BC43
PGC2018-096663-B-C44Severo Ochoa Centre of Excellence and MultiDark Consolider (MCIU), Junta de Andalucia, Spain
SOMM17/6104/UGRGeneralitat Valenciana: Grisolia, Spain
GRISOLIA/2018/119GenT, Spain
CIDEGENT/2018/034La Caixa Foundation
LCF/BQ/IN17/11620019EU: MSC program, Spain
71367
ROOT — A C++ framework for petabyte data storage, statistical analysis and visualization
This program has been imported from the CPC Program Library held at Queen's University Belfast (1969-2018)
Abstract
A new stable version (“production version”) v5.28.00 of ROOT [1] has been published [2]. It features several major improvements in many areas, most noteworthy data storage performance as well as statistics and graphics features. Some of these improvements have already been predicted in the original publication Antcheva et al. (2009) [3]. This version will be maintained for at least 6 months; new minor revisions (“patch releases”) will be published [4] to solve problems reported with this vers...
Title of program: ROOT
Catalogue Id: AEFA_v2_0
Nature of problem
Storage, analysis and visualization of scientific data
Versions of this program held in the CPC repository in Mendeley Data
AEFA_v1_0; ROOT; 10.1016/j.cpc.2009.08.005
AEFA_v2_0; ROOT; 10.1016/j.cpc.2011.02.00
- …